15 research outputs found
Gesture Assessment of Teachers in an Immersive Rehearsal Environment
Interactive training environments typically include feedback mechanisms designed to help trainees improve their performance through either guided- or self-reflection. When the training system deals with human-to-human communications, as one would find in a teacher, counselor, enterprise culture or cross-cultural trainer, such feedback needs to focus on all aspects of human communication. This means that, in addition to verbal communication, nonverbal messages must be captured and analyzed for semantic meaning. The goal of this dissertation is to employ machine-learning algorithms that semi-automate and, where supported, automate event tagging in training systems developed to improve human-to-human interaction. The specific context in which we prototype and validate these models is the TeachLivE teacher rehearsal environment developed at the University of Central Florida. The choice of this environment was governed by its availability, large user population, extensibility and existing reflection tools found within the AMITIES framework underlying the TeachLivE system. Our contribution includes accuracy improvement of the existing data-driven gesture recognition utility from Microsoft; called Visual Gesture Builder. Using this proposed methodology and tracking sensors, we created a gesture database and used it for the implementation of our proposed online gesture recognition and feedback application. We also investigated multiple methods of feedback provision, including visual and haptics. The results from the conducted user studies indicate the positive impact of the proposed feedback applications and informed body language in teaching competency. In this dissertation, we describe the context in which the algorithms have been developed, the importance of recognizing nonverbal communication in this context, the means of providing semi- and fully-automated feedback associated with nonverbal messaging, and a series of preliminary studies developed to inform the research. Furthermore, we outline future research directions on new case studies, and multimodal annotation and analysis, in order to understand the synchrony of acoustic features and gestures in teaching context
Dyadic Movement Synchrony Estimation Under Privacy-preserving Conditions
Movement synchrony refers to the dynamic temporal connection between the
motions of interacting people. The applications of movement synchrony are wide
and broad. For example, as a measure of coordination between teammates,
synchrony scores are often reported in sports. The autism community also
identifies movement synchrony as a key indicator of children's social and
developmental achievements. In general, raw video recordings are often used for
movement synchrony estimation, with the drawback that they may reveal people's
identities. Furthermore, such privacy concern also hinders data sharing, one
major roadblock to a fair comparison between different approaches in autism
research. To address the issue, this paper proposes an ensemble method for
movement synchrony estimation, one of the first deep-learning-based methods for
automatic movement synchrony assessment under privacy-preserving conditions.
Our method relies entirely on publicly shareable, identity-agnostic secondary
data, such as skeleton data and optical flow. We validate our method on two
datasets: (1) PT13 dataset collected from autism therapy interventions and (2)
TASD-2 dataset collected from synchronized diving competitions. In this
context, our method outperforms its counterpart approaches, both deep neural
networks and alternatives.Comment: IEEE ICPR 2022. 8 pages, 3 figure
Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer
Movement synchrony reflects the coordination of body movements between
interacting dyads. The estimation of movement synchrony has been automated by
powerful deep learning models such as transformer networks. However, instead of
designing a specialized network for movement synchrony estimation, previous
transformer-based works broadly adopted architectures from other tasks such as
human activity recognition. Therefore, this paper proposed a skeleton-based
graph transformer for movement synchrony estimation. The proposed model applied
ST-GCN, a spatial-temporal graph convolutional neural network for skeleton
feature extraction, followed by a spatial transformer for spatial feature
generation. The spatial transformer is guided by a uniquely designed joint
position embedding shared between the same joints of interacting individuals.
Besides, we incorporated a temporal similarity matrix in temporal attention
computation considering the periodic intrinsic of body movements. In addition,
the confidence score associated with each joint reflects the uncertainty of a
pose, while previous works on movement synchrony estimation have not
sufficiently emphasized this point. Since transformer networks demand a
significant amount of data to train, we constructed a dataset for movement
synchrony estimation using Human3.6M, a benchmark dataset for human activity
recognition, and pretrained our model on it using contrastive learning. We
further applied knowledge distillation to alleviate information loss introduced
by pose detector failure in a privacy-preserving way. We compared our method
with representative approaches on PT13, a dataset collected from autism therapy
interventions. Our method achieved an overall accuracy of 88.98% and surpassed
its counterparts by a wide margin while maintaining data privacy.Comment: Accepted by 24th ACM International Conference on Multimodal
Interaction (ICMI'22). 17 pages, 2 figure
Social Visual Behavior Analytics for Autism Therapy of Children Based on Automated Mutual Gaze Detection
Social visual behavior, as a type of non-verbal communication, plays a
central role in studying social cognitive processes in interactive and complex
settings of autism therapy interventions. However, for social visual behavior
analytics in children with autism, it is challenging to collect gaze data
manually and evaluate them because it costs a lot of time and effort for human
coders. In this paper, we introduce a social visual behavior analytics approach
by quantifying the mutual gaze performance of children receiving play-based
autism interventions using an automated mutual gaze detection framework. Our
analysis is based on a video dataset that captures and records social
interactions between children with autism and their therapy trainers (N=28
observations, 84 video clips, 21 Hrs duration). The effectiveness of our
framework was evaluated by comparing the mutual gaze ratio derived from the
mutual gaze detection framework with the human-coded ratio values. We analyzed
the mutual gaze frequency and duration across different therapy settings,
activities, and sessions. We created mutual gaze-related measures for social
visual behavior score prediction using multiple machine learning-based
regression models. The results show that our method provides mutual gaze
measures that reliably represent (or even replace) the human coders' hand-coded
social gaze measures and effectively evaluates and predicts ASD children's
social visual performance during the intervention. Our findings have
implications for social interaction analysis in small-group behavior
assessments in numerous co-located settings in (special) education and in the
workplace.Comment: Accepted to IEEE/ACM international conference on Connected Health:
Applications, Systems and Engineering Technologies (CHASE) 202
Immersive Virtual Reality and Robotics for Upper Extremity Rehabilitation
Stroke patients often experience upper limb impairments that restrict their
mobility and daily activities. Physical therapy (PT) is the most effective
method to improve impairments, but low patient adherence and participation in
PT exercises pose significant challenges. To overcome these barriers, a
combination of virtual reality (VR) and robotics in PT is promising. However,
few systems effectively integrate VR with robotics, especially for upper limb
rehabilitation. Additionally, traditional VR rehabilitation primarily focuses
on hand movements rather than joint movements of the limb. This work introduces
a new virtual rehabilitation solution that combines VR with KinArm robotics and
a wearable elbow sensor to measure elbow joint movements. The framework also
enhances the capabilities of a traditional robotic device (KinArm) used for
motor dysfunction assessment and rehabilitation. A preliminary study with
non-clinical participants (n = 16) was conducted to evaluate the effectiveness
and usability of the proposed VR framework. We used a two-way repeated measures
experimental design where participants performed two tasks (Circle and Diamond)
with two conditions (VR and VR KinArm). We found no main effect of the
conditions for task completion time. However, there were significant
differences in both the normalized number of mistakes and recorded elbow joint
angles (captured as resistance change values from the wearable sensor) between
the Circle and Diamond tasks. Additionally, we report the system usability,
task load, and presence in the proposed VR framework. This system demonstrates
the potential advantages of an immersive, multi-sensory approach and provides
future avenues for research in developing more cost-effective, tailored, and
personalized upper limb solutions for home therapy applications.Comment: Submitted to International Journal of Human-Computer Interactio
Multimodal Assessment Of Teaching Behavior In Immersive Rehearsal Environment - Teachlive\u3csup\u3eâ„¢\u3c/sup\u3e
Nonverbal behaviors such as facial expressions, eye contact, gestures, and body movements in general have strong impacts on the process of communicative interactions. Gestures play an important role in interpersonal communication in the classroom between student and teacher. To assist teachers with exhibiting open and positive nonverbal signals in their actual classroom, we have designed a multimodal teaching application with provisions for real-time feedback in coordination with our TeachLivE test-bed environment and its reflective application; ReflectLivE. Individuals walk into this virtual environment and interact with five virtual students shown on a large screen display. The recent research study is designed to have two settings (7-minute long each). In each of the settings, the participants are provided lesson plans from which they teach. All the participants are asked to take part in both settings, with half receiving automated real-time feedback about their body poses in the first session (group 1) and the other half receiving such feedback in the second session (group 2). Feedback is in the form of a visual indication each time the participant exhibits a closed stance. To create this automated feedback application, a closed posture corpus was collected and trained based on the existing TeachLivE teaching records. After each session, the participants take a postquestionnaire about their experience. We hypothesize that visual feedback improves positive body gestures for both groups during the feedback session, and that, for group 2, this persists into their second unaided session but, for group 1, improvements occur only during the second session
Improving Social Communication Skills Using Kinesics Feedback
Interactive training environments typically include feedback mechanisms designed to help trainees improve their performance through guided or self-reflection. When the training system deals with human-to-human communications, as one would find in a teacher, counselor or cross-cultural trainer, such feedback needs to focus on all aspects of human communication. This means that, in addition to verbal communication, nonverbal messages (kinesics in particular) must be captured and analyzed for semantic meaning. The goal of this research is to introduce interactive training models developed to improve human-to-human interaction. The specific context in which we prototype and validate these models is the TeachLivEâ„¢ teacher rehearsal environment developed at the University of Central Florida. We implemented an online gesture recognition application on top of the Microsoft Kinect software development kit with multiple feedback channels including visual and haptics. In a study of twelve participants rehearsing a teaching session in TeachLivE, we found that the online gesture recognition tool and its associated feedback method are effective and non-intrusive approaches for the purpose of communication-skill training. The algorithms employed, the results, and the implications for other interactive contexts are discussed in this paper